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Abstract 

An  autonomous  mobile  robot  with  a  vision  based 
target  acquisition  system  must  be  able  to  find  and 
maintain  fixation  on  a  moving  target  while  the  sys¬ 
tem  itself  is  in  motion.  This  capability  is  achieved 
by  most  animate  systems,  in  addition  to  man,  but 
has  proven  to  be  difficult  for  artificial  systems. 
We  propose  that  efficient  and  extensible  solutions 
to  the  target  acquisition  and  maintenance  problem 
may  be  found  when  the  machine  sensor-effector 
control  algorithms  emulate  the  mechanisms 
employed  by  biological  systems.  In  nature,  mo¬ 
tion  provides  the  foundation  for  visual  target  detec¬ 
tion,  acquisition,  tracking  and  trailing,  or  pursuit. 
We  present  in  this  paper  a  summary  of  some  sim¬ 
ple  and  robust  visual  motion  based  mechanisms  we 
have  developed  to  solve  these  problems,  and  de¬ 
scribe  their  implementation  in  an  autonomous 
visually  controlled  mobile  robot. 

1  Introduction 

Tbe  objective  of  this  research  is  to  develop  an  au¬ 
tonomous  mobile  robot  capable  of  visual  target 
detection,  tracking,  trailing,  and  obstacle  avoid¬ 
ance.  Specifically,  the  robot  is  tasked  with  follow¬ 
ing  a  human  walking  through  an  office  complex. 
For  a  demonstration  of  autonomy,  all  sensor- 
effector  loops  must  be  completed  on  the  robot, 
without  external  assistance  in  the  form  of  target 
designation  or  environmental  modeling.  The  robot 
must  accomplish  this  task  without  the  aid  of  any 
explicit  a  priori  knowledge  of  the  floor  plan,  or  the 
aid  of  any  special  codings  or  markings  in  the  envi¬ 
ronment,  including  any  special  treatment  of  the 
target.  Vision  will  be  the  only  means  by  which 
the  robot  will  be  permitted  to  gain  information 
about  the  external  environment.  Further,  only 
visual  motion  information  will  be  used. 


2  Algorithms 

We  fitted  a  mobile  robot  with  video  camera,  pan 
and  tilt  mechanism,  on-board  computer  and  bio¬ 
logically  based  visual-motor  control  algorithms. 
The  basic  information  that  we  made  available  to 
the  robot  controllers  through  the  vision  system  was 
motion,  contained  in  the  sequence  of  video 
frames.  Using  this  information  the  robot  could  be 
able  to  detect  targets  while  either  stationary  or  in 
transit.  The  motion  analysis  algorithms,  developed 
in  earlier  work  [Blackburn  et  al.,  1987],  were  en¬ 
hanced  to  allow  separation  of  unique  target  motion 
from  the  collateral  optic  flow  accompanying  the 
movement  of  the  robot  through  a  visually  complex 
environment.  The  modifications  included  the  use 
of  center-surround  receptive  fields  to  minimize  the 
optic  flow  created  by  tbe  transiting  robot  and  en¬ 
hance  the  unique  target  motion. 

2.1  Functional  Description 

Figure  1  diagrams  the  various  visual-motor  func¬ 
tions  which  perform  our  tracking,  trailing  and  ob¬ 
stacle  avoidance  tasks.  The  behavior  of  the 
animate  target  determines  the  behavior  of  the  ro¬ 
bot.  Unique  motion  in  the  periphery  causes  a  visu¬ 
al  reorienting  reflex  (saccade)  which  either  moves 
a  processing  window  within  the  available  visual 
space  (small  saccades)  or  the  entire  camera  pan 
and  tilt  unit  (large  saccades),  placing  the  center  of 
the  visual  field  (fovea)  on  the  center  of  mass  of  a 
moving  target.  A  large  saccade  is  also  performed 
when  the  processing  window  reaches  the  limit  of 
the  image  frame.  A  large  saccade  generates  a  win¬ 
dow  recentering  command. 

A  smooth  pursuit  reflex,  which  takes  input  from 
motion  in  the  foveal  region,  keeps  the  fovea 
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Figure  1.  Visual-motor  functions  and  relationships 


centered  on  the  acquired  moving  target.  The  opto¬ 
kinetic  reflex,  which  responds  to  full  field  motion, 
stabilizes  the  eye  when  the  body  is  in  motion. 

Reorientation  of  the  robot  to  trail  an  acquired  tar¬ 
get  is  accomplished  by  basing  commands  to  the  ro¬ 
bot  drive  motors  on  the  camera  pan  angle, 
requiring  the  robot  to  drive  in  the  direction  of  the 
gaze.  This  process  is  analogous  to  the  targeting 
motion  of  the  eyes,  head  and  body  in  biological 
systems. 

Trailing  is  accomplished  by  triggering  forward 
thrust  of  the  robot  when  the  predominant  motion  of 
a  centered  target  is  toward  the  center  of  the  visual 
field  (contracting  motion  field).  Collision  is 
avoided  by  decreasing  forward  thrust  when  the  tar¬ 
get  motion  is  away  from  the  center  (expanding). 
Obstacle  avoidance  is  achieved  by  decreasing 
thrust  on  the  side  of  the  robot  opposite  to  the  pe¬ 
ripheral  motion  away  from  the  center  of  the  visual 
field. 


The  obstacle  avoidance  reflex,  which  is  transitory, 
assumes  precedence  over  the  pursuit  reflex,  allow¬ 
ing  the  robot  to  skirt  around  obstacles  in  pursuit  of 
a  target. 

For  an  in-depth  discussion  of  the  biological  visual 
processes  from  which  we  derived  our  algorithms, 
see  Blackburn  et  al.  [1993]. 

2.2  Receptive  Fields  And  Log  Polar 
Mapping 

As  a  basis  for  motion  analysis,  sequential  frame 
subtraction  is  performed.  The  differences  are  tak¬ 
en  of  the  current  frame  (R)  and  the  previous  frame 
(H),  resulting  in  both  "on"  (Bl)  and  "off"  (BO) 
elements. 

BO  =  max  (0,  R  -  H) 

Bl  =max(0,  H-R).  [1] 

The  "on"  elements  indicate  light  intensity  increas¬ 
ing  in  a  localized  region  while  the  "off  elements 


[5] 


indicate  decreasing  intensity.  The  output  matrix  is 
organized  into  local  receptive  fields  and  submitted 
to  a  log-polar  transformation  [Blackburn,  1993a] 
where  the  receptive  field  centers  are  placed  pro¬ 
portionally  further  apart  with  their  distance  from 
the  receptor  matrix  center,  and  the  receptive  field 
radii  are  also  increased  proportionally  with  the 
distance. 

The  log-polar  transform  is  accomplished  by: 

Gi,.  =  (1/p)  *  s.t.ll(a,b)-(x,y)ll  <=  RFr  [2] 

where  i  and  j  are  the  coordinates  in  the  trans¬ 
formed  map,  a  and  b  are  coordinates  of  elements 
located  within  the  local  receptive  fields,  x  and  y 
are  locations  of  the  local  receptive  field  centers  in 
the  receptor  matrix,  and  p  is  the  variable  number  of 
elements  in  the  local  receptive  fields.  RFr  is  the 
radius  of  the  local  receptor  fields,  defined  by: 

RFr  ^  y  *  E,  [3] 

where  y  is  a  constant  computed  as  (2  *  (l-cos(2  * 
7i/m)))'^^  to  insure  that  for  m  number  of  local  recep¬ 
tive  fields  for  any  given  eccentricity,  the  radius  of 
each  local  receptive  field  reaches  the  center  of  the 
next  local  receptive  field  on  the  circumference. 

The  eccentricity  (E)  of  a  local  receptive  field,  de¬ 
fined  as  the  location  of  the  field  center  relative  to 
the  center  of  the  receptor  matrix,  varies  exponen¬ 
tially  with  the  serial  position  from  the  center  along 
the  radius  of  the  receptor  matrix  (with  the  con¬ 
straint  of  a  finite  packing  density  of  elements  near 
the  center  forcing  each  radius  to  be  at  least  one 
element  diameter  greater  than  the  previous). 

E  =  max  (i,  exp  (^  *  (i/n))),  [4] 

where  i  is  the  serial  distance  on  a  radius  from  the 
receptor  matrix  center  (from  1  to  n),  n  defines  the 
number  of  local  receptive  fields  to  be  located  on  a 
radius  from  the  receptor  matrix  center,  and  ^  = 
log(N/2)  with  N/2  representing  the  number  of  re¬ 
ceptors  (or  pixel  elements)  available  along  the  re¬ 
ceptor  matrix  radius. 

The  x,y  locations  of  the  receptive  local  field  cen¬ 
ters  on  the  receptor  matrix  are  determined  by 


X  =  (N/2)  -  E^  y  *  sin  0 
y  =  (N/2)  -I-  E^  y  *  cos  0,  [6] 

where  0  is  incremented  from  n/2  to  57t/2  by  27t/m. 
The  locations  of  receptive  field  centers  from  one 
eccentricity  to  the  next  is  staggered  by  7t/m  so  that 
a  slightly  asymmetric  hexagonal  matrix  of  recep¬ 
tive  field  centers  results. 

The  averaging  of  pixels  in  receptive  fields  empha¬ 
sizes  large-magnitude  effects.  This  is  a  desirable 
feature  in  building  reliable  artificial  vision  systems 
and  may  have  been  part  of  the  reason  for  its  adop¬ 
tion  by  nature. 

2.3  Motion  Analysis 

We  have  found  that  the  log  polar  transformation, 
which  is  also  found  in  biological  visual  systems, 
greatly  simplifies  motion  analysis.  On  the  com¬ 
putational  surface  that  has  undergone  a  log-polar 
transformation,  a  centered  target  will  cause  an  op¬ 
tic  flow  that  moves  in  parallel  in  one  direction  for 
the  receding  condition,  and  in  parallel  in  the  oppo¬ 
site  direction  for  the  expanding  (looming) 
condition. 

Peripheral  receptive  fields  are  large  and  set  far 
apart  compared  to  the  central  receptive  fields. 
Thus,  the  center  of  the  receptor  surface  is  more 
sensitive  to  slow  motion,  while  the  peripheral  re¬ 
gion  is  more  sensitive  to  fast  motion.  The  direction 
of  motion  on  the  log-polar  plane  can  be  assessed 
using  a  simple  compare-to-threshold  approach 
combined  with  feed  forward  facilitation  or  feed¬ 
back  inhibition  relative  to  the  preferred  direction 
of  the  motion  analyzer  [Blackburn  et  ah,  1987]. 

The  direction  of  motion  is  determined  by  dynamic 
filtering.  The  filter  elements  (MAI,  MAII,  and 
MAUI)  are  defined  by: 

MAI  =  Cl  *  MAI(t-I)  -I-  lijGii . ,  [7] 

MAII,j  =  C2  *  MAII./t-I)  +  g’ [8] 
MAIII„.j  =  C2  *  MAIII„. .(t-I)  -r  I,(I/k)  *  [9] 

MAUI,..  =  C2  *  MAIII,..(t-I)  -r  i:,(I/k)  *  G^,,.  [10] 

where  u  indicates  a  filter  element  supporting  the 
detection  of  upward  motion  on  the  transformed 
map,  d  indicates  downward  motion,  i  and  j  index 
the  location  of  elements,  k  indexes  the  offset  of 


[16] 


input  elements  in  the  +/-  vertical  directions,  and  Cl 
and  C2  are  constants  of  persistence  (1.0  >  Cl  > 
C2  >  0).  Upward  motion  on  the  transformed  map 
results  from  motion  toward  the  center  of  the  recep¬ 
tor  surface,  while  downward  motion  on  the  trans¬ 
formed  map  results  from  motion  away  from  the 
center.  (On-  and  off-center  pathways  are  pro¬ 
cessed  in  parallel  until  the  final  output,  when  their 
products  are  combined.  These  equations  are 
shown  only  for  the  on-center  activity.) 

The  input  to  the  motion  analysis  subnetwork  on  the 
subsequent  increment  of  time  is  then  passed 
through  to  the  direction  of  motion  detectors 
(MAIVjjij  and  MAIV^jj)  based  upon  the  filter 
activity, 

MAIV^jj  =  max  (0,  C3  *  MAI  -  MAIII„.p  *  .  + 

I,  max(0,  MAII;^,j  -C3*MAI)  *  (1/k)  *  G;; .  ,  ’[11] 

MAIV,i .  =  max(0,  C3  *  MAI  -  MAIII,.p  *  G;; .  -i- 
Z,  max(0,  MAII;.,j-C3*MAI)  *  (I/k)  *  G^,  . ,  ’  [12] 

where  C3  is  a  gain  constant  (I.O  >  C3  >  0).  Equa¬ 
tions  [7]  through  [10]  are  duplicated  for  the  off- 
center  activity,  and  added  to  MAIV^^ij  and  MAIV^jj 
as  in  equations  [11]  and  [12]. 

One  output  of  the  motion  analysis  subnetwork 
(MAVIj^ij  and  MAVI^jj)  is  the  net  positive  differ¬ 
ence  of  the  opposite  direction  of  motion  detectors, 

MAVI^jj  =  max  (0,  MAIV„,.  -  MAIV,;.)  [13] 

MAVI,.^  =  max  (0,  MAIV,,.  -  MAIV„,.).  [14] 

Another  output  of  the  motion-analysis  subnetwork 
(MAVyjj  and  MAV^jj)  is  a  measure  of  the  motion 
contrast  between  the  center  and  the  surround  of  a 
local  region.  The  sums  of  the  local  motion  detec¬ 
tors  in  a  neighborhood  are  taken  for  the  opposing 
directions  and  compared.  The  largest  represents 
the  net  or  most  likely  direction  of  motion  due  to 
self  movement  through  the  environment.  If  the 
direction  of  motion  of  the  center  of  the  neighbor¬ 
hood  is  consistent  with  this  net  motion,  then  the 
center  can  be  ignored,  otherwise  it  likely  signals 
unique  target  motion. 

MAV„.,^  =  MAVI„.,^,  if  ZMAVI„.,^  <  ZMAVI,,,^ 

=  0  else  [15] 

MAV,.,^  =  MAVI,.,^,  if  ZMAVI,.,^  <  ZMAVI,,,^ 


=  0  else 

The  outputs  of  the  MAV  elements  are  sent  to  the 
target  acquisition  subnetwork  while  the  output  of 
the  MAVI  elements  are  sent  to  the  approach  and 
avoidance  subnetworks  (described  below). 

2.4  Target  Acquisition,  The  Saccade  Reflex 

Targets  are  detected  by  a  model  of  the  vertebrate 
optic  tectum,  using  a  biased  cooperative  mecha¬ 
nism  between  hemifields.  The  optic  tectum  deter¬ 
mines  the  center  of  mass  of  potential  targets  and 
directs  motors  controlling  sensor  positioning  to 
bring  that  center  of  mass  of  the  potential  target  to 
the  center  of  the  receptor  field.  The  mechanism 
employed  in  the  present  application  differs  some¬ 
what  from  mechanisms  previously  reported  by  this 
group  that  contribute  to  the  generation  of  scan 
paths  [Blackburn,  1993b].  Instead  of  selecting  a 
defined  region  of  the  visual  space  that  exceeded  all 
other  regions  on  an  activity  criterion,  the  unique 
motion  potentials  (from  the  MAV  elements  of  each 
receptive  field)  were  weighted  by  the  distance  of 
the  receptive  field  centers  from  the  center  of  the 
receptor  surface,  and  integrated  separately  in  each 
hemifield.  This  modification  brings  the  model 
closer  to  mechanisms  implicated  by  the  behavior  of 
amphibians,  and  somewhat  further  from  mecha¬ 
nisms  implicated  by  the  behavior  of  mammals. 
The  final  target  location  was  the  vector  average 
computed  from  the  sum  of  the  weighted  activity  of 
one  or  both  of  the  hemifields  if  that  sum  exceeded 
a  running  global  threshold.  The  advantage  of  the 
amphibian  model  is  that  it  allowed  the  machine  tar¬ 
get  acquisition  subnetwork  to  select  the  center  of 
mass  of  most  targets  that  either  occupied  space  in 
parts  of  a  hemifield  or  in  parts  of  both  hemifields. 

The  input  to  the  target  detection  and  centering  sub¬ 
network  comes  from  the  unique  motion  detectors 
(MAVjjj  and  MAV^^jj).  These  are  weighted  by  the 
distances  of  their  locations  from  the  center  of  the 
receptor  matrix  and  normalized  by  the  sum  of  their 
potentials  to  find  the  location  of  the  center  of 
activity  for  target  localization.  A  bias  that  is  pro¬ 
portional  to  eccentricity  is  applied  to  the  input  to 
favor  peripheral  over  central  targets. 


The  input  is  retinotopically  distributed  and  inte¬ 
grated  over  time,  allowing  excitation  to  build  up  in 
a  local  area, 

OT_in./t)  =  C4  *  OT_in./t-l)  -i- 

W;  *  (MAV,,.  -t  MAV„,.),  [17] 

where  C4  is  a  constant  of  persistence  (1.0  >  C4  > 
0),  and  W;  is  a  bias  factor  that  increases  with  ec¬ 
centricity  (i). 

The  required  X  and  Y  change  in  receptor  matrix 
orientation  (accomplished  by  camera  pan  and  tilt 
commands)  to  center  the  matrix  on  a  new  target  are 

dX  =  Ej.(x_distance;j  *  OT_in;P  / E;jOT_injj 
dY  =  I; .  (y_distancei .  *  OT_in;  p  /  I^  ’pTJn  ’.  [  1 8] 

Noise  is  filtered  from  the  subnetwork  by  disallow¬ 
ing  contributions  to  dX  and  dY  from  one  hemi¬ 
sphere  if  the  sum  of  inputs  in  that  hemisphere 
(ZOT_injj  )  is  less  than  a  dynamic  threshold  (0). 
The  threshold  is  increased  whenever  it  is  exceeded 
by  the  sum  of  inputs.  Otherwise  it  dissipates  like 
all  other  potentials  with  persistence  in  the  network. 

0  =  C6  *  0  -I-  C7  *  S.  pT_ini . ,  if  0  <  . 

=  C6*0,  else  [19] 

where  C6  is  the  threshold  persistence  and  C7  is  a 
gain  factor  (1.0  >  C7  >  C6  >  0). 

2.5  Target  Tracking  By  The  Smooth  Pur¬ 
suit  Reflex 

Once  acquired,  a  target  must  be  kept  on  the  center 
of  the  receptive  field  where  the  resolution  is  the 
greatest.  The  higher  pixel  density  in  the  center  of 
the  receptive  field  permits  the  early  assessment  of 
the  direction  of  a  target  that  is  moving  slowly. 

The  smooth  pursuit  mechanism  receives  its  input 
from  the  motion  analysis  subnetwork.  Due  to  er¬ 
rors  inherent  in  the  mechanical  pan  and  tilt  unit, 
slow  pursuit  is  performed  by  adjusting  the  process¬ 
ing  window  within  the  available  video  frame.  The 
rate  of  change  of  the  video  window  (dU,  dV)  is 
computed  by: 

dU=C8*(dU  -I-  Si  .(x*RFri  .*MAVI,.p/IijMAVI,.p 


[20] 

dV=C8*(dV  -I-  Sij(y*RFrij*MAVI,ip/SijMAVI,ij) 

[21] 

where  x  and  y  define  the  quadrant  of  the  loca¬ 
tion  of  activity  (+/-  1),  and  C8  is  a  constant  of 
persistence  (1.0  >  C8  >  0). 

2.6  Approach/Avoidance  Responses 

While  the  target  is  centered  in  the  window,  the  for¬ 
ward  velocity  of  the  robot  can  be  controlled  by  the 
advance  or  retreat  of  the  target.  This  motion  on 
the  optical  Z  axis  is  assessed  by  the  opposite 
directions  of  motions  on  the  computational  plane 
in  the  central  region.  Any  motion  toward  the  cen¬ 
ter  of  the  receptor  plane  can  be  considered  as  a 
possible  retreat  of  the  target  and  worthy  of  an  ap¬ 
proach  response,  while  bilateral  motions  away 
from  the  center  indicate  a  target  whose  image  is 
growing  larger,  probably  due  to  its  advance  upon 
the  platform,  and  demand  a  reduction  in  forward 
thrust.  These  reductions  are  proportional  to  the 
location  of  the  motion  on  the  computational  sur¬ 
face,  such  that  peripheral  locations  generate  the 
largest  reductions,  contributing  to  collision 
avoidance. 

While  the  platform  is  moving  through  the  environ¬ 
ment,  unilateral  image  flows  away  from  the  center 
of  the  receptor  surface  in  the  peripheral  region  in¬ 
dicate  the  presence  of  potential  obstacles.  The  re¬ 
quired  response  is  to  reduce  the  thrust  on  the 
contralateral  drive  motor,  and  increase  the  thrust 
on  the  ipsilateral  drive  motor.  When  traveling 
down  a  corridor  with  sufficient  pattern  contrast  on 
the  two  walls,  such  a  reflex  would  tend  to  keep  the 
platform  as  nearly  in  the  center  of  the  corridor  as 
possible. 

The  output  of  the  motion  analysis  subnetwork 
(MAVIj^ij  and  MAVI^jj)  is  also  used  to  control  the 
robot  drive  motors  according  to  simple  rules.  Mo¬ 
tor  commands  accumulate  and  dissipate  according 
to 

motor^R  =  C5  *  input(t-l)  -i-  input,  [22] 

where  C5  is  the  persistence  of  the  input  (1.0  >  C5 
>  0).  The  input  comes  from  the  two  hemi  visual 
fields  and  causes  an  increase  or  decrease  in  thrust 
in  both  drive  motors. 


When  either  hemi  visual  field  detects  motion  to¬ 
ward  the  center  (indicating  a  receding  target), 
thrust  is  increased  to  both  motors  inversely  propor¬ 
tional  to  the  absolute  value  of  the  distance  from 
the  center  to  the  location  of  the  motion  on  the  re¬ 
ceptor  surface 

input=-i-Sjj(max_dist  -  abs(x_distjj)*MAVI^,;p,  [23] 

where  max_dist  is  the  greatest  lateral  extent  of  the 
receptor  matrix.  The  sign  of  x_distjj  indicates  the 
location  of  the  motion  on  the  left  (-)  or  the  right  (-I-) 
of  center. 

When  both  hemi  visual  fields  detect  motion  away 
from  the  center,  thrust  is  decreased  to  both  motors 
directly  proportional  to  the  absolute  value  of  the 
distance  from  the  center  to  the  location  of  the  mo¬ 
tion  on  the  receptor  surface 

input  =  -  I./abs(x_dist.p  *  (MAVI,.p.  [24] 

Potential  obstacles  that  are  detected  by  asymmetric 
optic  flow  away  from  the  center  of  the  receptor 
matrix  cause  increased  thrust  on  the  same  side  (g) 
and  decreased  thrust  on  the  side  opposite  (f)  to  the 
optic  flow.  These  changes  in  thrust  are  transitory 
and  non-zero  only  under  the  conditions  of  asym¬ 
metric  optic  flow,  and  during  an  active  forward 
drive  command.  The  degree  of  change,  resulting  in 
a  turn  away  from  the  obstacle,  is  proportional  to 
the  net  forward  thrust. 

motor^(t)  =  motor^(t-l)  -i-  (motor^(t-l)/max_thrust) 

*  Sjj(max_dist  -  abs(x_distij))  *MAVIj;j,  [25] 

motor j(t)  =  motorf(t-l)  -  (motorf(t-l)/max_thrust)  * 
Ejj(max_dist  -  abs(x_dist;j))  *MAVIjjj.  [26] 

2.7  Orienting  Reflex 

The  robot  will  turn  toward  a  translating  target 
based  on  the  disparity  between  the  axis  of  the 
camera  and  the  axis  of  the  robot  body.  The  pan 
disparity  is  sensed  by  counters  on  the  pan  axle.  It 
is  either  negative,  indicating  a  target  location  on 
the  left  of  the  robot  axis,  zero,  indicating  a  target 
location  in  front  of  the  robot,  or  positive,  indicat¬ 
ing  a  target  location  on  the  right  of  the  robot  axis. 
This  turning  reflex  is  inhibited  by  the  obstacle 


avoidance  reflex  if  the  required  turn  is  in  the  direc¬ 
tion  of  the  obstacle. 

The  turn  command  is  transient  and  inversely  pro¬ 
portional  to  the  net  forward  thrust: 

motor^^  motorL(t-l)  -i- 

pan_disp  *  (1.0-motorL/max_thrust),  [27] 
motor^  =  motor^(t-l)  - 

pan_disp  *  (1.0-motor^/max_thrust).  [28] 

2.8  Arbitration  Of  Target  Orientation  And 
Obstacle  Avoidance 

Without  a  mechanism  to  prioritize  the  reflexes,  the 
robot  could  be  forced  into  an  obstacle  by  the  pur¬ 
suit  reflex,  or  loose  track  of  its  target  by  deflection 
from  an  obstacle.  Since  collision  with  obstacles 
must  be  avoided  in  most  cases,  the  turning  reflex  to 
reduce  the  pan- axis  disparity  should  be  inhibited  as 
long  as  there  is  an  obstacle  in  that  direction.  Yet, 
in  order  to  maintain  a  fix  on  the  target,  the  camera 
pursuit  reflex  should  be  allowed  to  increase  the 
axis  disparity.  As  long  as  the  window  and  saccade 
mechanisms  can  keep  the  target  in  the  center  of  the 
receptor  surface  the  platform  will  move  forward 
on  its  own  body  axis.  The  design  of  the  system 
insures  that  the  peripheral  vision  available  to  the 
robot,  when  its  camera  has  panned  to  an  extreme 
(as  in  the  case  of  a  target  moving  behind  an  ob¬ 
stacle),  allows  the  detection  of  new  obstacles  yet 
in  the  forward  direction  of  the  platform.  Thus,  the 
platform  always  moves  in  a  direction  that  it  can 
see.  When  the  original  obstacle  has  been  passed, 
the  orienting  reflex  will  be  released  and  the  ex¬ 
treme  disparity  of  camera  and  body  axes  will 
cause  the  platform  to  turn  sharply  in  the  original 
direction  of  the  target. 

3.  Hardware 

We  use  a  Transitions  Research  Corporation  (TRC) 
Labmate  Mobile  Robot  Base.  A  single  CCD  vid¬ 
eo  camera  with  a  90  degree  field  of  view,  mounted 
on  a  pan  and  tilt  mechanism  built  in-house,  pro¬ 
vides  monocular  input  to  the  vision  processing 
hardware.  Camera  position  is  taken  from  shaft 
encoders  located  on  the  pan  and  tilt  axles.  Wheel 
motion  information  is  obtained  from  encoders  lo¬ 
cated  on  both  left  and  right  drive  motor  axles.  Vi¬ 
sion  processing  hardware  includes  an  Imaging 


Technologies  OFG  Frame  Grabber  coupled  to  a 
Hyperspeed  Technology  coprocessor  board  with 
two  i860  microprocessors.  The  vision  processing 
hardware  cards  are  hosted  on  an  80486  PC  com¬ 
puter  located  in  the  robot  housing.  The  PC  pro¬ 
vides  FO  to  the  Labmate  and  pan  and  tilt 
controllers.  The  Hyperspeed  board  receives  video 
data  directly  from  the  OFG  board  at  frame  rate 
over  an  ITI  vision  bus.  One  i860  processor  is  dedi¬ 
cated  to  subsampling  the  input  frame  and  making 
decisions  about  the  required  motor  responses, 
while  the  other  i860  processor  integrates  the  visual 
input  into  receptive  fields  and  performs  motion 
analysis.  Pan,  tilt  and  drive  motor  commands  are 
sent  to  the  80486  for  integration  and  execution. 

4  Results 

4.1  Frame  Rate 

Actual  processing  rate  with  the  algorithms  de¬ 
scribed  herein  is  approximately  8  frames  per 
second. 

4.2  Resolution 

The  pixel  matrix  provided  to  the  robot  vision  sys¬ 
tem  was  128  by  128  distributed  evenly  over  a  68 
degree  square  visual  field.  This  resulted  from 
sampling  every  third  pixel  in  a  384  pixel  square 
portion  from  the  original  512  by  480  input  frame. 
The  128  by  128  window  was  selected  from  within 
the  available  data  based  on  smooth  pursuit  com¬ 
mands.  Due  to  the  log-polar  mapping  the  reso¬ 
lution  at  the  center  was  roughly  2  sampled  pixels 
per  degree  visual  angle,  while  at  the  periphery  the 
resolution  decreased  to  0.14  sampled  pixels  per 
degree.  However,  all  of  the  available  pixels  from 
the  128  by  128  sample  that  fell  in  a  receptive  field 
were  included  in  the  field  average. 

4.3  Motion  Sensitivity 

Moving  objects  can  be  detected  anywhere  in  the 
visual  field  if  they  cross  any  of  the  128  by  128 
sampled  pixels.  Slow  moving  targets  or  targets 
that  changed  velocity  frequently  are  more  likely  to 
evoke  responses  from  the  central  fields.  Con¬ 
versely,  rapidly  moving  objects  are  more  likely  to 
evoke  responses  from  peripheral  fields.  The  opti¬ 
mal  target  velocity  is  a  function  of  field  size  and 


frame  rate.  At  a  frame  rate  of  8  frames  per  se¬ 
cond,  the  optimal  velocity  of  a  target  translating 
across  the  horizontal  near  the  center  of  the  visual 
space  is  4  degrees  of  visual  angle  per  second.  At  a 
distance  of  ten  feet  from  the  camera,  this  is  a  speed 
of  about  0.7  foot  per  second.  The  optimal  velocity 
for  a  peripheral  location  under  the  same  condi¬ 
tions  is  about  seven  times  greater,  or  about  5  feet 
per  second. 

4.4  Behavioral  Capabilities 

Testing  was  performed  in  a  large  partitioned  room 
with  an  open  work  area  of  32  by  18  feet.  Three 
walls  of  this  work  area  contained  windows,  doors 
and  office  furniture.  An  example  of  target  acquisi¬ 
tion  and  pursuit  is  shown  in  the  photographs  of 
Figure  2.  From  a  resting  position  the  robot  turned 
and  moved  forward  in  pursuit  of  a  human  walking 
into  its  visual  space.  Obstacle  avoidance  was  dis¬ 
abled  during  this  demonstration  run  to  allow  the 
robot  to  approach  the  cluttered  desk.  With  ob¬ 
stacle  avoidance  in  place  the  robot  tended  to  ap¬ 
proach  the  target  only  slowly,  until  the  position  of 
the  target  allowed  the  robot  a  clear  run  down  the 
center  of  the  floor. 

5  Discussion 

We  have  been  able  to  demonstrate  target  acquisi¬ 
tion,  tracking  and  trailing  with  some  obstacle 
avoidance  using  biologically  based  algorithms. 
Several  difficulties  with  functional  integration  re¬ 
main.  For  example,  if  a  target  is  able  to  escape  the 
smooth  pursuit  mechanism  and  moves  out  of  the 
central  region  of  the  robot's  visual  field,  the  ob¬ 
stacle  avoidance  response  will  interpret  the  target 
as  an  obstacle  and  cause  the  robot  to  turn  away. 
While  the  target  acquisition  mechanism  may  re¬ 
acquire  the  target,  the  robot  can  become  disori¬ 
ented.  The  optokinetic  reflex  also  tends  to  drive 
the  robot  into  obstacles.  While  the  arbitration  pro¬ 
cedure  is  designed  to  avoid  this,  forward  motion  is 
restricted  when  the  pan  disparity  is  great  (to  avoid 
driving  into  a  blind  region)  which  can  eliminate 
the  image  flow  that  clues  the  robot  to  the  presence 
of  the  obstacle.  The  present  system  can  acquire 
new  targets  while  on  the  move,  but  the  target  mo¬ 
tion  required  for  this  is  often  unrealistic.  That  is, 
the  signal  to  noise  ratio  for  segmenting  unique  tar¬ 
get  motion  from  induced  motion  in  the  background 


Figure  2.  The  autonomous  visually  guided  robot  trailing  a  walking  human  in  a  cluttered  environment. 


is  still  unreasonably  high.  The  paradox  is  that 
successful  pursuit  of  the  moving  target  minimizes 
relative  motion  of  the  target  on  the  receptor  sur¬ 
face  while  the  pursuit  motions  of  the  robot  increase 
induced  motion  of  the  background.  The  acquisi¬ 
tion  of  additional  targets  is  presently  inhibited  by 
an  increased  threshold  during  pursuit  and  by  com¬ 
petition  between  central  and  peripheral  regions  of 
the  retina.  One  of  the  critical  problems  here  for 
which  we  have  only  a  partial  solution  (equations 
[15]  and  [16])  is  that  of  separating  unique  target 
motion  from  the  motion  of  the  background  during 
robot  transits  or  camera  pans.  Biological  systems 
probably  have  a  more  flexible  mechanism  of  atten¬ 
tion  control,  permitting  frequent  and  repeated 
sampling  of  potential  targets,  with  some  additional 
criteria  for  target  discrimination. 
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